RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

نویسندگان

Gang ZHAO

Ruoying SUN

چکیده

Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning System realizes active exploration to an environment, but, the learning process is separated into two phases and estimate values are not derived during the process of identifying the environment. Dyna-Q architecture makes fuller use of a limited amount of experiences and achieves a better policy with fewer environment interactions during identifying an environment by learning and planning with constrained time, however, the exploration is not active. This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for exploring an environment into time constraints exploration planning and compounds it into an integrated system of learning, planning and reacting for aiming for the best of both methods. Based on improving the performance of exploring an environment, refining the model of the environment, the RTPQ learning system accelerates the learning rate for obtaining an optimal policy. The results of experiment on navigation tasks demonstrate that the RTP-Q learning system is efficient. key words: reinforcement learning, planning, reacting, exploration, exploitation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...

متن کامل

Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism

This paper develops an adaptive control method for controlling frequency and voltage of an islanded mini/micro grid (M/µG) using reinforcement learning method. Reinforcement learning (RL) is one of the branches of the machine learning, which is the main solution method of Markov decision process (MDPs). Among the several solution methods of RL, the Q-learning method is used for solving RL in th...

متن کامل

Probabilistic Exploration in Planning while Learning

Sequential decision tasks with incomplete infor mation are characterized by the exploration prob lem; namely the trade-off between further exploration for learning more about the environ ment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these ...

متن کامل

An Advance Q Learning (AQL) Approach for Path Planning and Obstacle Avoidance of a Mobile Robot

The goal of this paper is to improve the performance of the well known Q learning algorithm, the robust technique of Machine learning to facilitate path planning in an environment. Until this time the Q learning algorithms like Classical Q learning(CQL)algorithm and Improved Q learning (IQL) algorithm deal with an environment without obstacles, while in a real environment an agent has to face o...

متن کامل

طراحی پایدارساز PSS3B بر اساس الگوریتم KH و Q-learning برای میراسازی نوسانات فرکانس پایین سیستم قدرت تک‌ماشینه

The main purpose of this paper is to develop a supplementary signal using reinforcement learning (RL) to improve the performance of power system stabilizer (PSS). RL is one of the most important issues in the field of artificial intelligence and is the popular method for solving Markov decision procedure (MDP). In this paper, a control method is developed based on Q-learning and used to improve...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

نویسندگان

چکیده

منابع مشابه

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism

Probabilistic Exploration in Planning while Learning

An Advance Q Learning (AQL) Approach for Path Planning and Obstacle Avoidance of a Mobile Robot

طراحی پایدارساز PSS3B بر اساس الگوریتم KH و Q-learning برای میراسازی نوسانات فرکانس پایین سیستم قدرت تک‌ماشینه

عنوان ژورنال:

اشتراک گذاری